Quant面试『真题』系列:第三期
量化投资与机器学习微信公众号,是业内垂直于量化投资、对冲基金、Fintech、人工智能、大数据等领域的主流自媒体。公众号拥有来自公募、私募、券商、期货、银行、保险、高校等行业30W+关注者,连续2年被腾讯云+社区评选为“年度最佳作者”。
量化投资与机器学公众号在2022年又双叒叕开启了一个全新系列:
QIML汇集了来自全球顶尖对冲基金、互联网大厂的真实面试题目。希望给各位读者带来不一样的求职与学习体验!
第三期
▌题目难度:Medium
题目
If your Time-Series Dataset is very long, what architecture would you use?
答案
If the dataset for time-series is very long, LSTMs are ideal for it because it can not only process single data points, but also entire sequences of data. A time-series being a sequence of data makes LSTM ideal for it.
For an even stronger representational capacity, making the LSTM's multi-layered is better.
Another method for long time-series dataset is to use CNNs to extract information.
---
▌题目难度:Medium
题目
How do you Normalise Time-Series Data?
答案
Two normalization methods that are commonly used are:
Range-based Normalization: In range-based normalization, the minimum and maximum values of the time series are determined. Let these values be denoted by min and max, respectively. Then, the time series value is mapped to the new value in the range (0,1) as follows:
Standardization: In standardization, the mean and standard deviation of the series are used for normalization. This is essentially the Z-value of the time series. Let and represent the mean and standard deviation of the values in the time series. Then, the time series value is mapped to a new value as follows:
---
▌题目难度:Hard
题目
Can Hidden Markov Models be used to model Time-Series data?
答案
Yes, any time-series can be fit using HMM, but there are some constraints:
It should follow the Markov property.
There is some variance that other models are not able to capture (in other words, the system is partially observable).
---
▌题目难度:Hard
题目
Explain briefly the different methods of Noise-Removal for Time-Series Data
答案
Noise-prone hardware, such as sensors, is often used for time-series data collection. The approach used by most of the noise removal methods is to remove short-term fluctuations.
It should be pointed out that the distinction between noise and interesting outliers is often a difficult one to make. Two methods, referred to as binning and smoothing, are often used for noise removal.
Binning
题目
What are some different ways of Trajectory Patterns Mining?
答案
There are many different ways in which the problem of trajectory pattern mining may be formulated. This is because of the natural complexity of trajectory data that allows for multiple ways of defining patterns.
Frequent Trajectory Paths
A key problem is that of determining frequent sequential paths in trajectory data. To determine the frequent sequential paths from a set of trajectories, the first step is to convert the multidimensional trajectory (with numeric coordinates) to a 1-dimensional discrete sequence. Once this conversion has been performed, any sequential pattern mining algorithm can be applied to the transformed data.
Colocation Patterns
Colocation patterns are designed to discover social connections between the trajectories of different individuals. The basic idea of colocation patterns is that individuals who frequently appear at the same point at the same time are likely to be related to one another.
Colocation pattern mining attempts to discover patterns of individuals, rather than patterns of spatial trajectory paths. Because of the complementary nature of this analysis, a vertical representation of the sequence database is particularly convenient.
---
▌题目难度:Hard
题目
Compare State-Space Models and ARIMA models
答案
ARIMA is a universal approximator - you don't care what is the true model behind your data and you use universal ARIMA diagnostic and fitting tools to approximate this model. It is like a polynomial curve fitting - you don't care what is the true function, you always can approximate it with a polynomial of some degree.
Compared to ARIMA, state-space models allow you to model more complex processes, have interpretable structure, and easily handle data irregularities; but for this, you pay with increased complexity of a model, harder calibration, less community knowledge.
Because there is such a great variety of state-space models formulations (much richer than the class of ARIMA models), the behavior of all these potential models is not well studied, and if the model you formulated is complicated - it's hard to say how it will behave under different circumstances. Of course, if your state-space model is simple or composed of interpretable components, there is no such problem.
ARIMA is always the same well studied ARIMA so it should be easier to anticipate its behavior even if you use it to approximate some complex process.
Because state-space allows you to directly and exactly model complex/nonlinear models, then for these complex/nonlinear models, you may have problems with the stability of filtering/prediction (EKF/UKF divergence, particle filter degradation). You may also have problems with calibrating a complicated model's parameters - it's a computationally hard optimization problem.
ARIMA is simple, has fewer parameters (1 noise source instead of 2 noise sources, no hidden variables) so its calibration is simpler.
---
相关阅读
干翻机器学习面试!
全程干货!Citadel在职Quant求职经验分享
G-Research:量化研究员面试『真题』
小编尽力了!G-Research量化面试『真题』答案出炉!
Quant Puzzle:高级享受!
独家!中国量化私募面试Q&A系列——鸣石投资
独家!中国量化私募面试Q&A系列——白鹭资管
Quant求职系列:Jane Street烧脑Puzzle(2019-2020)
Two Sigma:面试还是挺难(附面经)!
你能做几道?Jane Street烧脑面试题!
独家!全球顶尖对冲基金LeetCode面试题汇总